DSC 140B
Problems tagged with convolutional neural networks

Problems tagged with "convolutional neural networks"

Problem #161

Tags: lecture-15, convolutional neural networks

A grayscale image of size \(32 \times 32 \times 1\) is convolved with a filter of size \(5 \times 5\). No padding is applied, and the stride is 1. What is the shape of the output response map?

Solution

\(28 \times 28 \times 1\).

With no padding and stride 1, the output height and width are each \(32 - 5 + 1 = 28\). The filter slides over each \(5 \times 5\) block of the image from left to right and top to bottom, producing one output value per position.

Problem #162

Tags: lecture-15, convolutional neural networks

An input \(5 \times 5\) grayscale image \((I)\) is represented by the matrix below.

\[ I = \begin{pmatrix} 0.2 & 0.1 & 0.4 & 0 & 0.3 \\ 0 & 0.5 & 0.2 & 0.7 & 0 \\ 0.3 & 0 & 0.6 & 0.1 & 0.5 \\ 0.1 & 0.4 & 0 & 0.3 & 0.2 \\ 0 & 0.2 & 0.5 & 0 & 0.4 \end{pmatrix}\]

Suppose you convolve \(I\) with the \(3 \times 3\) filter

\[ F = \begin{pmatrix} 1 & 0 & -1 \\ 0 & 1 & 0 \\ -1 & 0 & 1 \end{pmatrix}\]

to get the response map \(I'\)(with stride 1 and no padding). What is the value of \(I'_{11}\) -- the entry in the 1st row and 1st column of the output?

Solution

\(I'_{11} = 0.6\).

The \(3 \times 3\) patch at the top-left corner of \(I\) is:

\[\begin{pmatrix} 0.2 & 0.1 & 0.4 \\ 0 & 0.5 & 0.2 \\ 0.3 & 0 & 0.6 \end{pmatrix}\]

Applying the filter element-wise and summing:

$$\begin{align*} I'_{11}&= 0.2 \cdot 1 + 0.1 \cdot 0 + 0.4 \cdot(-1) + 0 \cdot 0 + 0.5 \cdot 1 + 0.2 \cdot 0 \\&\quad + 0.3 \cdot(-1) + 0 \cdot 0 + 0.6 \cdot 1 \\&= 0.2 - 0.4 + 0.5 - 0.3 + 0.6 \\&= 0.6 \end{align*}$$

Problem #163

Tags: lecture-15, convolutional neural networks

An input image has shape \(80 \times 80 \times 11\), where \(11\) is the number of channels. We wish to convolve this image with a 3D filter of shape \(5 \times 5 \times k\). What must the value of \(k\) be for the convolution to work?

Solution

\(k = 11\).

A 3D convolution filter must have the same number of channels as the input. The filter slides spatially across the height and width of the image, but at each position it computes a dot product across all channels. Therefore the filter's third dimension must match the input's channel count.

Problem #164

Tags: lecture-15, convolutional neural networks

Consider a convolutional neural network with the following architecture. The input is a \(10 \times 10 \times 1\) grayscale image. It passes through Conv layer 1 (3 filters of size \(3 \times 3\), stride 1, no padding), producing an output of shape \(8 \times 8 \times 3\). Then \(2 \times 2\) max pooling is applied, producing an output of shape \(4 \times 4 \times 3\). Next is Conv layer 2 (5 filters of size \(3 \times 3 \times 3\), stride 1, no padding), producing an output of shape \(2 \times 2 \times 5\). This is flattened and fed into a fully connected layer with \(n\) nodes, followed by an output layer with 1 node.

Part 1)

What is the value of \(n\)?

Solution

\(n = 20\).

The output of Conv layer 2 is \(2 \times 2 \times 5\). Flattening this gives \(2 \times 2 \times 5 = 20\) values, so the fully connected layer has \(20\) nodes.

Part 2)

What is the total number of learnable parameters in the network, excluding biases?

Solution

\(182\).

Conv layer 1 has 3 filters of shape \(3 \times 3\), each with \(9\) weights, for \(3 \times 9 = 27\) parameters. Conv layer 2 has 5 filters of shape \(3 \times 3 \times 3\), each with \(27\) weights, for \(5 \times 27 = 135\) parameters. The fully connected layer connects to the output: \(20 \times 1 = 20\) parameters. The grand total is \(27 + 135 + 20 = 182\).

Note that max pooling has no learnable parameters.

Problem #165

Tags: lecture-15, convolutional neural networks

An input \(4 \times 4\) grayscale image \((I)\) is represented by the matrix below.

\[ I = \begin{pmatrix} 0.7 & 0.2 & 0.1 & 0.8 \\ 0.3 & 0.5 & 0.4 & 0.2 \\ 0.6 & 0.1 & 0.9 & 0.3 \\ 0.2 & 0.8 & 0.5 & 0.6 \end{pmatrix}\]

\(2 \times 2\) max pooling is applied to this image. What is the resulting output?

Solution

\(\begin{pmatrix} 0.7 & 0.8 \\ 0.8 & 0.9 \end{pmatrix}\).

With \(2 \times 2\) max pooling, we divide the \(4 \times 4\) image into four non-overlapping \(2 \times 2\) blocks and take the maximum of each. Top-left: \(\max\{0.7, 0.2, 0.3, 0.5\} = 0.7\). Top-right: \(\max\{0.1, 0.8, 0.4, 0.2\} = 0.8\). Bottom-left: \(\max\{0.6, 0.1, 0.2, 0.8\} = 0.8\). Bottom-right: \(\max\{0.9, 0.3, 0.5, 0.6\} = 0.9\).